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Abstract 


Neuropeptides play a variety of roles in many physiological processes and serve as po- 
tential therapeutic targets for the treatment of some nervous-system disorders. In recent 
years, there has been a tremendous increase in the number of identified neuropeptides. 
Therefore, we have developed NeuroPep, a comprehensive resource of neuropeptides, 
which holds 5949 non-redundant neuropeptide entries originating from 493 organisms 
belonging to 65 neuropeptide families. In NeuroPep, the number of neuropeptides in in- 
vertebrates and vertebrates is 3455 and 2406, respectively. It is currently the most com- 
plete neuropeptide database. We extracted entries deposited in UniProt, the database 
(www.neuropeptides.nl) and NeuroPedia, and used text mining methods to retrieve 
entries from the MEDLINE abstracts and full text articles. All the entries in NeuroPep 
have been manually checked. 2069 of the 5949 (35%) neuropeptide sequences were col- 
lected from the scientific literature. Moreover, NeuroPep contains detailed annotations 
for each entry, including source organisms, tissue specificity, families, names, post- 
translational modifications, 3D structures (if available) and literature references. 
Information derived from these peptide sequences such as amino acid compositions, iso- 
electric points, molecular weight and other physicochemical properties of peptides are 
also provided. A quick search feature allows users to search the database with keywords 
such as sequence, name, family, etc., and an advanced search page helps users to com- 
bine queries with logical operators like AND/OR. In addition, user-friendly web tools like 
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browsing, sequence alignment and mapping are also integrated into the NeuroPep 


database. 
Database URL: hittp://isyslab.info/NeuroPep 


Introduction 


Neuropeptides are small proteineous substances that are 
produced by neurons, released in a regulated fashion, and 
act on either neural substrates, such as neurons, glial cells or 
on non-neuronal target cells, such as a gland or muscle (1). 
It has been shown that different neuropeptides are involved 
in a number of physiological processes such as food intake, 
metabolism, stress control, pain perception, social behav- 
iors, learning, memory, etc. (2-5). Neuropeptides are typic- 
ally 3-100 amino-acid-residue long, produced from larger 
precursor molecules by a series of post-translational process- 
ing (6). There are two general approaches (‘function-first’ or 
‘peptide-first’) to discover neuropeptides. The function-first 
approach is based on bioassays, receptor-binding assays and 
genetic analysis, while the peptide-first approach is based on 
cDNA cloning of precursor, using neuropeptide precursor 
processing enzymes, peptidomics, etc. (7). Since the first 
neuropeptide substance P was discovered by van Euler and 
Gaddum in 1931 (8) and sequenced in 1971 (9), there has 
been a tremendous increase in the number of identified 
neuropeptides over the last 40 years. Furthermore, due to 
their wide range of roles in health and disease, neuropep- 
tides are considered as attractive therapeutic targets for ner- 
vous-system disorders such as depression, anxiety and 
Parkinson’s disease (10-12). Several databases like EROP- 
Moscow (13), SwePep (14), PeptideDB (15), PepBank (16) 
have been developed to hold general bioactive peptides. 
There are also some databases such as APD2 (17), CAMP 
(18), DAMPD (19), which were designed for specific anti- 
microbial peptides. However, there are only two recent 
databases specific for neuropeptides. 

About 90 genes encoding classical and candidate neuro- 
peptides in mammalian genomes were provided in the on- 
line neuropeptide database www.neuropeptides.nl (1). It 
provided genes, precursors and the processed active pep- 
tide names on the website and also provides hyperlinks to 
bioinformatics databases on genomes, transcripts, brain 
expression and homologies to other species. The 
NeuroPedia database was mainly designed for identifica- 
tion of neuropeptides using mass spectrometry data (20). It 
extracted 847 neuropeptide sequences of eight organisms 
belonging to the phylum Chordata and collected 3401 
identified spectra from NIST spectral libraries and their 
own in-house spectral datasets (20). Since the two data- 
bases were specialized for the authors’ study, many inverte- 
brate-specific neuropeptides were not considered such as 


allatostatins, pyrokinins, crustacean cardioactive peptides, 
pigment-dispersing factors, etc., and neither of the data- 
bases gave detailed annotations for each neuropeptide 
sequence. Although the invertebrate neuropeptide charac- 
terization lagged behind the vertebrate neuropeptide, there 
were many new invertebrate neuropeptides that were iden- 
tified owing to the development of the mass spectrometry 
technique in recent years (21-23). 

Therefore, this study focused on collecting as many known 
neuropeptide sequences and related information as possible 
and integrating them into a searchable archive. With this in 
mind, we not only extracted entries from the public resources 
of UnitProt (24), databases www.neuropeptides.nl and 
NeuroPedia, but also retrieved neuropeptide sequences in the 
abstracts and full texts of literatures in MEDLINE by text- 
mining methods. To better serve the community, the database 
also provided comprehensive information for each entry and 
integrates user-friendly browsing and search facilities along 
with useful tools such as BLAST (25), ClustalW (26) and 
Map. We hope that this database will be helpful for the re- 
search community and be a valuable resource for neuropep- 
tide-based therapeutics development. 


Materials and methods 


Data collection and compilation 


NeuroPep database is a comprehensive resource of neuro- 
peptides. The data were collected from various resources 
including MEDLINE abstracts, full papers, UniProt, data- 
bases www.neuropeptides.nl (1) and NeuroPedia (20). 
First, we searched Pubmed with the keyword ‘neuropep- 
tide’ which returned 240 388 articles of which only 7277 
contained peptide sequences. The Peptide::Pubmed (16) 
method was used to extract the peptide sequences from the 
abstracts of these articles. In total, 10 515 peptide se- 
quences were extracted from the 7277 different abstracts. 
To check whether or not the sequence extracted from the 
abstract is a neuropeptide, we extracted the peptide name 
or the term describing the peptide sequence and then com- 
pared it with a neuropeptide name list which was built 
based on www.neuropeptides.nl and UniProt. It was found 
that many of these sequences are analogs, antagonists, 
agonists, fragments, motifs, etc. After filtering, 2595 nat- 
ural neuropeptide sequences remained. Meanwhile, we 
also mined name, family, modification and organism infor- 
mation from the corresponding texts. 
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Furthermore, we wrote a perl script to select articles 
which may contain the neuropeptide sequences identified 
by MS/MS in their full texts. The script was designed for 
the keyword search by ‘Neuropeptide? and ‘Mass 
Spectrometry’ from the title/abstract fields and returned 
579 articles out of the 240 388 articles. Neuropeptides 
described in the full text articles were also selected based 
on the family or peptide names which were mentioned in 
the articles. 3471 sequences along with their name, family 
and organism information were collected manually from 
the full texts. In addition, neuropeptide precursors were se- 
lected from the entries of UniProt (release 2013-05) which 
were annotated as ‘neuropeptide’ in the ‘keyword’ line. 
2715 neuropeptide sequences annotated as ‘peptide’ or 
‘chain’ in the ‘Feature’ line were retrieved from these pre- 
cursors. Moreover, we downloaded the corresponding 
UniProt protein file according to the UniProt ID from the 
blink results of all the neuropeptide precursors listed in the 
database www.neuropeptides.nl. 2179 sequences were fur- 
ther retrieved from these protein files. Combined with all 
the 847 sequences collected from NeuroPedia database, 
the integrated dataset contained 11 807 sequences. 
During the process to remove redundancy according 
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to the sequence and organism, we preferred to keep the 
UniProt entries. As a result, 5949 non-redundant entries 
remained. 


NeuroPep server and web interface 


We constructed the NeuroPep server using Apache Tomcat 
Server 7 with MySQL Community Server 5.6. HTML, JSP 
and Ajax were used to build the front-end, and Java and 
JDBC were used to implement the web services and data 
management. The Highcharts Javascript package was used 
to add interactive plots to the web site. An overview of the 
user interface of NeuroPep database is shown in Figure 1. 


Data organization 


The data in NeuroPep database are organized into the fol- 
lowing fields (Figure 1B): 


Accession number: A unique identifier to tag each database 
entry. Each accession number begins with the characters 
‘NP’ followed by five digits. 

Name: The name of the neuropeptide which was collected 
from the literature or UniProt. 
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Length 


Modification 


Gene Ontology 


Sequence 


Properties 


Structure 


Reference 


leset to inibal onentabon spz 


Figure 1. An overview of the user interface of NeuroPep. (A) The browse output of NPY neuropeptide family. (B) An example of an entry NP03900 of 
the NPY family. (C) The properties page of the entry NP03900. (D) The structure view page of the entry NP03900. 
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Organism: The scientific name of the organism with the 
neuropeptide sequence. 

Tissue Specificity: The specific tissue expressing of the 
neuropeptide. 

Family: The classification of the neuropeptide. It was ex- 
tracted from the ‘SIMILARITY’ line of UniProt or litera- 
tures. In addition, entries without family information and 
neuropeptide-like proteins in phylum Nematoda were 
annotated as ‘NA’. 

UniProt ID: the accession number in UniProt (if available) 
and a link to its UniProt entry. 

Modification: The type of post-translational modification 
and the position in the sequence for each modified 
residue. 

Sequence: The amino acid sequence of the peptide. 

Structure: The three-dimensional structure of the neuro- 
peptide from the PDB database. Jmol was integrated into 
the database to facilitate structure vision. 

Reference: The source of the neuropeptide sequence. 
External links to the abstracts of the articles are also 
provided. 


In addition to the above directly extracted information, 
we computed the frequency of each of the 20 standard 
amino acid types, isoelectric points and molecular weight 
of each neuropeptide. We also computed the frequency of 
each special type of amino acid (positive charge, negative 
charge, hydrophilic and hydrophobic) to help user to know 
which type of amino acid each neuropeptide prefers. The 
above information is shown in the field of properties 
(Figure 1C). 


Data retrieval 


A powerful browsing facility allows a user to browse the 
database using three different major categories including 
neuropeptide family, organism and modification. There 
are 65 neuropeptide families and 493 organisms in the cur- 
rent database, and they are presented in alphabetical order. 
The modification field includes five most common modifi- 
cations occurring in neuropeptide sequences including ami- 
dation, acetylation, pyroglutamination, sulfation and 
phosphorylation. The browse output has the option to sort 
the data by clicking on the column title. The sequences can 
be downloaded in fasta format, and all the information 
can be downloaded in txt format. 

Furthermore, users can query the database by two types 
of search tools: quick search and advanced search. Quick 
search enables users to search the database by the follow- 
ing fields: NPID, organism, family, name, UniProt ID, se- 
quence and PMID. When the sequence field is chosen, all 
the neuropeptide sequences containing the input sequence 


Database, Vol. 2015, Article ID bav038 


will be returned. Advanced search allows users to build 
complex search queries using logical conditions like AND/ 
OR. In addition, advanced search allows users to specify a 
search range of the fields including length, molecular 
weight and isoelectric point. 


Integration of web tools 


To facilitate analysis of neuropeptide sequences, various 
web-based tools have been integrated into NeuroPep. The 
following is a brief description of these tools. 


Blast: To find sequences in NeuroPep that are similar to a 
user-provided sequence, we have incorporated the 
BLAST search tool on the website. It allows users to sub- 
mit the sequence in FASTA format and choose the user- 
defined parameters including E-value cutoff, and the sub- 
stitution matrix for sequence alignment. The output is 
shown in the standard BLAST output, which includes the 
matching sequences, BLAST score and E-value. 

ClustalW: ClustalW, one of the most commonly used tools 
for multiple sequence alignment, is also integrated into 
the database to help users find the conserved motifs in a 
group of neuropeptide sequences. The input consists of 
multiple peptide sequences in FASTA format, and the 
output is the standard ClustalW multiple sequence align- 
ment format. 

Map: Given the sequence of a neuropeptide precursor, one 
may need to find all possible processed neuropeptides in 
the database. To facilitate this task, we have developed 
the Map tool. Map finds all peptides in the database that 
match exactly to a substring in the user-provided se- 
quence. For each output sequence, the starting position in 
the input sequence where the matching substring begins 
is also provided. 


Submission 


Users can submit their published or newly discovered data 
into NeuroPep via the online submission form. The 
NeuroPep server will update the database automatically 
once the newly submitted entry is confirmed by us. A con- 
firmation email will be sent to the submitter after valid- 
ation. Periodically, we will also search for and submit new 
entries from the newly published literatures and UniProt. 


Results and discussion 


After removing redundancy and manual checking, the cur- 
rent release of the database (version 1.0, 2014-11-26) 
holds 5949 non-redundant entries, among which 4035 
neuropeptide sequences have been confirmed by at least 
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one reference. The 5949 non-redundant entries cover 493 
organisms belonging to 65 different neuropeptide families. 
The 493 organisms can be classified into nine phyla includ- 
ing Annelida, Arthropoda, Chordata, Cnidaria, 
Echinodermata, Hemichordata, Mollusca, Nematoda and 
Platyhelminthes. Table 1 lists all the 65 neuropeptide fami- 
lies together with the number of different sources of neuro- 
peptides and the phyla distribution of each family. 


Source distribution of neuropeptide families 


The entries of NeuroPep were derived primarily from two 
types of sources. The first consists of public databases 
UniProt, 
NeuroPedia. The second consists of scientific literature 
sources including MEDLINE abstracts and full texts. 3880 
sequences were collected from the public repositories 


including www.neuropeptides.nl and 


(UniProt:2216; databases www.neuropeptides.nl and 
NeuroPedia:1664). All 1664 neuropeptide sequences from 
www.neuropeptides.nl and NeuroPedia have UniProt 
entries, but they were not annotated as ‘neuropeptide’ in 
the keyword line of UniProt like the classical neuropeptides 
cerebellins, calcitonins, somastostatins, GnRHs, etc. To 
check whether the 2069 sequences from the literature have 
UniProt entries that lack the ‘neuropeptide’ annotation, 
these sequences were compared with all the ‘peptides’ and 
‘chains’ in UniProt. This resulted in 117 such sequences. 
Therefore, the UniProt ID of each of these sequences was 
added to NeuroPep. As can be seen from Table 1, there are 
three families including calcitonin-like peptide, ecdysis 
triggering hormone, orcokinin, for which >90% of the 
entries were collected from the literature. All the 13 mem- 
bers of the neuropeptide family of calcitonin-like peptide 
and ecdysis triggering hormone were extracted from scien- 
tific literature. The orcokinin family contains 191 entries 
covering 16 organisms. Only five entries of Apis mellifera, 
two of Orconectes limosus, six of Procambarus clarkia and 
six of Rhodnius prolixus were collected from UniProt, the 
other 172 entries of 13 different organisms were collected 
from literatures. For another 45 neuropeptide families, 
1-89% of the entries were obtained from text mining. For 
the remaining 17 neuropeptide families, all the entries 
were retrieved from the public resources. 


Phyla distribution of neuropeptide families 


Fifteen out of all 65 neuropeptide families contain peptides 
from at least two different phyla. The most widely distributed 
FMRFamide-related peptide family spans seven phyla includ- 
ing Annelida, Arthropoda, Chordata, Cnidaria, Mollusca, 
Nematoda and Platyhelminthes. The other 50 neuropeptide 
families in the current version of the database are restricted to 
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a single phylum. Eight families including pyrokinin, orcoki- 
nin, arthropod PDH, etc. are Arthropoda specific. There are 
38 neuropeptide families like bradykinin, calcitonin, cerebel- 
lin, somatostatin, etc. which are only found in Chordata. All 
the egg-laying hormones and small cardioactive peptides are 
identified from Mollusca, and all the LWamide neuropeptides 
and all the YGGW-amide-related peptides are from Cnidaria 
and Nematoda, respectively. 


Neuropeptide and family distribution among 
phyla 

The neuropeptide frequency distribution among phyla is 
given in Figure 2A. More than 44% (2635) neuropeptides 
are found in the phylum Arthropoda. The phylum 
Chordata has the second largest number of neuropeptides 
(2457, 41%). The majority of neuropeptides in 
Arthropoda and Chordata are found in the organisms of 
Callinectes sapidus (170) and Rattus norvegicus (294), re- 
spectively. The organisms with the most identified neuro- 
peptides in the remaining phyla are as follows: 
Caenorhabditis elegans (Nematoda, 226), Aplysia californ- 
ica (Mollusca, 150), Anthopleura elegantissima (Cnidaria, 
18), Eisenia foetida (Annelida, 7), Fasciola hepatica 
(Platyhelminthes, 5), Strongylocentrotus purpuratus 
(Echinodermata, 8) and Saccoglossus kowalevskii 
(Hemichordata, 2). 

Figure 2B shows the number of families for the neuro- 
peptides in each phylum. The chordate phylum contains the 
highest number of neuropeptide families, which is 48, fol- 
lowed by Arthropoda and Mollusca, which are 21 and 12, 
respectively. Comparing Figure 2A and B, it is observed that 
the Arthropoda phylum has more identified neuropeptides 
than the Chordata phylum, but Chordata has more families. 


Amino acid and neuropeptide length distribution 


The amino acid composition and length distribution of 
neuropeptides of the current database are shown in Figure 
3A and B, respectively. It is observed that residues like 
Leu, Ala, Ser, Glu and Gly are more abundant while resi- 
dues like Trp, Cys, Met, His and Tyr are the least abun- 
dant. We have also computed separately the amino acid 
composition of invertebrate and vertebrate neuropeptides 
in the NeuroPep database. It is found that the Phe abun- 
dance in invertebrate neuropeptides is 2.6 times of that in 
vertebrate neuropeptides, while the Lys abundance in ver- 
tebrate neuropeptides is 3 times of that in invertebrate 
neuropeptides (Supplementary Figure $1). As shown in 
Figure 3B, the length of the majority of the neuropeptides 
(79%) is <30. There are 18 entries in the database with 
the shortest length of three-amino-acids long. A small 
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Table 1. The number of neuropeptides from each data source Table 1. Continued 


and phyla distribution for the 65 neuropeptide families in Family name Source Phyla 
NeuroPep ee | ee 
Lit? SP? NL‘ 
Family name Source Phyla 
eee hee NAPRTase 0 0 4 Chordata 
Lit? SP’ NLS Natriuretic peptide 1 0 83 Chordata 
Neurexophilin 0 0 16 Chordata 
7B2 1 10 8 Chordata Neuromedins 6 7 2 Chordata 
ACBP 0 0 19 Chordata Neuropeptide B/W 1 11 0 Chordata 
Adrenomedullin 0 0 15 Chordata Neuropeptide S 0 4 0 Chordata 
AKH/HRTH/RPCH 69 113 0 Arthropoda Neurotensin 15 0 21 Chordata 
Allatostatin 275 86 0 Arthropoda; Nematoda NPY 100 93 33 Arthropoda; Chordata; 
Apelin 0 0 16 Chordata Mollusca; Platyhel 
Arthropod CHH/ 68 49 0 Arthropoda minthes 
MIH/ GIH/VIH Nucleobindin 0 0 11 Chordata 
hormone Opioid 56 117 2 Annelida; Chordata; 
Arthropod PDH 33 4 0 Arthropoda Mollusca 
Bombesin/neurome- 11 0 22 Chordata Orcokinin 172 19 0 Arthropoda 
ay T anateiisini Orexin 0 11 0 Chordata 
Bradykinin 9 0 7 Chordata Parathyroid 0 4 19 Chordata 
Calcitonin 4 0 39 Chordata komone 
Calcitonin-like 6 0 0 Arthropoda Periviscerokinin 19 263 0 Annelida; Arthropoda 
peptide POMC 3 0 204 Chordata 
CART 2 10 7 Chordata ProSAAS 1 27 0 Chordata 
CCAP 10 6 0 Arthropoda; Mollusca Pyrokinin 56 210 0 Arthropoda 
Cerebellin 4 0 20 Chordata Resistin/FIZZ 0 0 7 Chordata 
Chromogranin/ 21 0 77 Chordata RFamide 0 5 0 Chordata 
secretogranin neuropeptide 
Corazonin 17 46 0 Arthropoda; Mollusca Sauvagine/cortico- 5 0 20 Chordata; Arthropoda 
Cystatin 0 0 22 Chordata tropin-releasing 
Ecdysis triggering 7 0 0 Arthropoda factot/urotensin I 
hormone SCP 2 5 0 Mollusca 
Endothelin/ 0 29 1 Chordata Serpin 17 0 37 Chordata 
sarafotoxin Somastostatin 22 0 33 Chordata 
FMRFamide related 455 406 6 Arthropoda; Mollusca; Somatotropin/ 0 0 101 Chordata 
peptide Nematoda; Annelida; prolactin 
Platyhelminthes; Tachykinin 84 185 0 Annelida; Arthropoda; 
Chordata; Cnidaria Chordata: Mollusca 
Galanin 7 19 4 Chordata TRH 1 0 17. Chordata 
Gastrin/cholecysto 35 86 112 Arthropoda; Chordata UWietensin? 4 0 8 Chordata 
kinin Vasopressin/ 7 1 77 Arthropoda; Chordata; 
Glucagon 14 0 228 Chordata oxytocin Mollusca 
GnRH 32 0 42 Chordata; Mollusca VGF 6 0 21 Chordata 
Insulin 9 4 194 Chordata; Mollusca; YGGW-amide 0 14 0 Nematoda 
Arthropoda related peptide 
Kinin 20 19 1 Arthropoda; Chordata 
KISS1 3 0 14 Chordata ACBP, Acyl-CoA-binding protein; AKH/HRTH/RPCH, Adipokinetic hor- 
Leptin 0 0 21 Chordata mone/ Hypertrehalosaemic factor / Red pigment-concentrating hormone; 
LWamide 3 23 © Cnidaria CHH/MIH/GIH/VIH, Crustacean hyperglycemic hormones/Molt-inhibiting 
isurobëntide hormone/Gonad-inhibiting hormone/ Vitellogenesis-inhibiting hormone; 
. pep PDH, Pigment-dispersing hormone; CCAP, Crustacean cardioactive peptide; 
poe 3 33 4 Chordata ELH, Egg-laying hormone; SCP, Small cardioactive peptide; CART, Cocaine- 
ing hormone amphetamine-regulated transcript protein; POMC, Pro-opiomelanocortin; 
Molluscan ELH 0 43 0 Mollusca TRH, Thyrotropin-releasing hormone. 
Motilin 1 42 2 Chordata “Lit denotes literature. 
Myomodulin 2 18 0 Annelida; Mollusca PSP denotes UniProt. 
Myosuppressin 23 20 0 Arthropoda “NL denotes the database www.neuropeptides.nl and NeuroPedia. 


(Continued) 
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Figure 2. (A) The neuropeptide frequency distribution based on phyla. (B) The neuropeptide family distribution based on phyla. 


portion (6%) of sequences are >100 amino acids in length, 
most of which (341/346) are annotated as ‘CHAIN’ in the 
feature line of UniProt. 


Comparison to other existing neuropeptide 
databases 


We compared our database, NeuroPep, to the other two 
specific neuropeptide databases: the database at www. 
neuropeptides.nl (1) and the database NeuroPedia (20). 
The developers of the database www.neuropeptides.nl 
mainly traced back the origin and the development of the 
concept of neuropeptide and proposed a conservative def- 
inition of neuropeptides. Based on the definition, they col- 
lected and analyzed over 90 genes encoding classical or 
candidate neuropeptides of the mammalian genomes. 


NeuroPedia was mainly designed for identification of 
neuropeptides using mass spectrometry data. It offered 847 
neuropeptide sequences and the corresponding spectra 
data along with spectral library search tools. 

Comparing to the above two neuropeptide database, 
NeuroPep has more experiment-validated neuropeptide se- 
quences and cover more neuropeptide families and organ- 
isms. The current version of our database contains 5949 
neuropeptide sequences belonging to 65 different neuro- 
peptide families and covering 493 different organisms of 
nine different phyla. There are 16 invertebrate neuropep- 
tide families such as allatostatin, periviscerokinin, orcoki- 
nin, pyrokinin, etc. that were not included by the 
databases www.neuropeptides.nl and NeuroPedia, which 
accounts for 28% of the NeuroPep database. The members 
of these 16 families are from Arthropoda, Mollusca, 
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Figure 3. (A) The amino acid composition distribution of neuropeptides in NeuroPep database. (B) The amino acid length distribution of neuropep- 


tides in NeuroPep database. 


Nematoda, Annelida or Cnidaria. Note that 35% of the 
entries of our database were collected from text mining of 
MEDLINE abstracts and full text articles, which have not 
yet been collected or annotated by other public resources. 
Moreover, NeuroPep offers comprehensive annotations for 
each entry and user-friendly search facilities. Third, power- 
ful analysis tools including BLAST, CLUSTALW and Map 
are also integrated in our database. Therefore, we think 
the NeuroPep database will be a useful resource for the re- 
search community. 


Summary and future perspectives 


NeuroPep is a comprehensive neuropeptide database, 
which holds 5949 neuropeptide sequences originating 
from 493 organisms of nine different phyla. It is the most 


complete specific neuropeptide database to date. The data- 
base offers a user-friendly interface coupled with powerful 
browsing, searching and analysis tools. It allows users to 
submit new entries online. Each new entry will be vali- 
dated before incorporating it into NeuroPep. It is very 
time-consuming to manually check the peptide sequences 
and extract related information from science papers. 
Therefore, a new text mining tool based on natural 
language processing is currently under development. It will 
be integrated into NeuroPep to update the database 
automatically as soon as novel neuropeptides become 
available. 


Supplementary Data 


Supplementary data are available at Database Online. 


SLOT ‘E JOQUIDAON UO }son8 Aq /310'speumofpiozxo'oseqerep//:dny Wo, popeopumoq 


Database, Vol. 2015, Article ID bav038 


Acknowledgements 


We thank Xinlong Jiang, Di Pi, Feng He and Jing Tang for their 
helpful advices to improve the database. We are grateful to the au- 


thors and curators of the resources we used: the databases www. 
neuropeptides.nl, NeuroPedia, UniProt and MEDLINE. 


Funding 


This work was supported by grants from the National Natural 
Science Foundation of China [30700162, 61073095] and the 
Fundamental Research Funds for the Central Universities of China 
[2014TS138]. 


Conflict of interest. None declared. 


References 


1. 


10. 


11. 


Burbach,J.P. (2010) Neuropeptides from concept to online data- 
base www.neuropeptides.nl. Eur. J. Pharmacol., 626, 27-48. 
Hokfelt,T., Xu,Z.Q. et al. (2000) 
Neuropeptides—an Neuropharmacology, 39, 
1337-1356. 

Sobrino Crespo,C., Perianes Cachero,A., Puebla Jimenez,L. et al. 
(2014) Peptides and food intake. Front. Endocrinol., 5, 58. 
Shahjahan,M., Kitahashi,T. and Parhar,I.S. (2014) Central path- 
ways integrating metabolism and reproduction in teleosts. Front. 
Endocrinol., 5, 36. 

Kormos,V. and Gaszner,B. (2013) Role of neuropeptides in anx- 


Broberger,C., 


overview. 


iety, stress, and depression: from animals to humans. 
Neuropeptides, 47, 401-419. 

Malenka,R. (2010) Intercellular Communication in the Nervous 
System. Academic Press, New York. 

Fricker,L.D. (2012) Colloquium Series on Neuropeptides, Vol. 
1. Morgan & Claypool Life Sciences, San Rafael, pp. 1-122. 

van Euler, U.S. and Gaddum,J.H. (1931) An unidentified 
depressor substance in certain tissue extracts. J. Physiol., 72, 
74-87. 

Chang,M.M., Leeman,S.E. and Niall,H.D. (1971) Amino-acid 
sequence of substance P. Nat. New Biol., 232, 86-87. 

Hoyer,D. and Bartfai,T. (2012) 


neuropeptide receptors: drug targets, and peptide and non- 


Neuropeptides and 


peptide ligands: a tribute to Prof. Dieter Seebach. Chem. 
Biodivers., 9, 2367-2387. 

Holmes,A., Heilig,M., Rupniak,N.M. et al. (2003) 
Neuropeptide systems as novel therapeutic targets for 
depression and anxiety disorders. Trends Pharmacol. Sci., 24, 
580-588. 


12. 


13: 


14. 


15. 


16. 


17: 


18. 


19. 


20. 


21. 


22. 


23. 


24. 


25. 


26. 


Page 9 of 9 


Nilsson,A., Falth,M., Zhang,X. e al. (2009) Striatal alterations of 
secretogranin-1, somatostatin, prodynorphin, and cholecystokinin 
peptides in an experimental mouse model of Parkinson disease. 
Mol. Cell. Proteomics, 8, 1094-1104. 

Zamyatnin,A.A., Borchikov,A.S., Vladimirov,M.G. et al. (2006) 
The EROP-Moscow oligopeptide database. Nucleic Acids Res., 
34, D261-D266. 

Falth,M., Skold,K., Norrman,M. et al. (2006) SwePep, a data- 
base designed for endogenous peptides and mass spectrometry. 
Mol. Cell. Proteomics, 5, 998-1005. 

Liu,F., Baggerman,G., Schoofs,L. et al. (2008) The construction 
of a bioactive peptide database in Metazoa. J. Proteome Res., 7, 
4119-4131. 

Shtatland,T., Guettler,D., Kossodo,M. et al. (2007) PepBank—a 
database of peptides based on sequence text mining and public 
peptide data sources. BMC Bioinformatics, 8, 280. 

Wang,G., Li,X. and Wang,Z. (2009) APD2: the updated anti- 
microbial peptide database and its application in peptide design. 
Nucleic Acids Res., 37, D933-—D937. 

Thomas,S., Karnik,S., Barai,R.S. et al. (2010) CAMP: a useful 
resource for research on antimicrobial peptides. Nucleic Acids 
Res., 38, D774-D780. 

Seshadri Sundararajan,V., Gabere,M.N., Pretorius,A. et al. 
(2012) DAMPD: a manually curated antimicrobial peptide data- 
base. Nucleic Acids Res., 40, D1108-D1112. 

Kim, Y., Bark,S., Hook,V. et al. (2011) NeuroPedia: neuropep- 
tide database and spectral library. Bioinformatics, 27, 
2772-2773. 

Nassel,D.R. (2002) Neuropeptides in the nervous system of 
Drosophila and other insects: multiple roles as neuromodulators 
and neurohormones. Prog. Neurobiol., 68, 1-84. 

O’Shea,M. and Schaffer,M. (1985) Neuropeptide function: 
the invertebrate contribution. Annu. Rev. Neurosci., 8, 
171-198. 

Hummon,A.B., Amare,A. and Sweedler,J.V. (2006) Discovering 
new invertebrate neuropeptides using mass spectrometry. Mass 
Spectrom. Rev., 25, 77-98. 

Dimmer,E.C., Huntley,R.P., Alam-Faruque, Y. et al. (2012) The 
UniProt-GO annotation database in 2011. Nucleic Acids Res., 
40, D565-D570. 

Altschul,S.F., Gish, W., Miller, W. et al. (1990) Basic local align- 
ment search tool. J. Mol. Biol., 215, 403-410. 

Thompson,J.D., Higgins,D.G. and  Gibson,T.J. (1994) 
CLUSTAL W: improving the sensitivity of progressive multiple 
sequence alignment through sequence weighting, position-spe- 
cific gap penalties and weight matrix choice. Nucleic Acids Res., 
22, 4673-4680. 


SIOZ ‘E JOQUISAON UO sand Aq /310°seutnolpsofxo'osequiep//:dyyy wory papeopumog 


